Stanford University ’ s Arabic - to - English Statistical Machine Translation System for the 2009 NIST MT Open Evaluation

نویسنده

  • Christopher D. Manning
چکیده

This document describes Stanford University’s first entry into a NIST Arabic-English MT evaluation. We describe two main improvements over a previous Chinese-English submission (Galley et al., 2008): a hierarchical lexicalized reordering model (Galley and Manning, 2008) and a technique for performing minimum error rate training (Cer et al., 2008) that outperforms the standard Powell method. 1 System Description 1.1 Phrase-based translation system The core engine of our system is Phrasal, a phrasebased decoder similar to Moses (Koehn et al., 2007). In its baseline configuration and basic set of features, Phrasal replicates Moses almost exactly, and differs only in the way the decoder breaks ties between translation hypotheses that have the same score. Unless otherwise indicated, we use the same default parameters as Moses (e.g., same recombination heuristic, same maximum number of translation options for each input phrase). Phrasal uses a log-linear approach common to many state-of-the-art statistical machine translation (SMT) systems (Och and Ney, 2004). Given an input Arabic sentence f, which is to be translated into an English sentence e, the decoder searches for the most probable translation ê according to the following decision rule: ê = argmax e {P(e|f)} = argmax e { M ∑ m=1 λmhm(f,e)} where hm(f,e) are M arbitrary feature functions over sentence pairs, such as translation probabilities. Our system incorporates the following 17 feature functions: • Two phrase translation probabilities Pml(e| f ) and Pml( f |e), computed using the (unsmoothed) relative frequency estimate Pml(ē| f̄ ) = count(e, f )/ ( ∑ e′ count(e′, f ) ) , where f and e constitute a pair of aligned phrases. • Two lexical translation probabilities Plex(e| f ,a) and Plex( f |e,a), similar to those presented in (Koehn et al., 2003): Plex(e| f ,a) = n ∏ i=1 1 |{i|(i, j) ∈ a}| ∑ (i, j)∈a p(ei| f j), where n is the length of the phrase e, and a is the internal word alignment between e and f .1 • Eight hierarchical lexicalized phrase reordering scores for each phrase pair. We select from four types of orientations (monotone, 1Distinct instances of a given phrase pair (e, f ) may be observed with different internal alignments. In these cases, we select the most frequent alignment (like Moses but in contrast to (Koehn et al., 2003)). About 0.3% of our phrases have lexical translation probabilities that differ from Moses since our feature extraction implementation breaks ties between alignment counts differently. However, we observe no impact on MT performance. swap, left discontinuous, and right discontinuous) and model both left-to-right and rightto-left re-orderings. Laplace smoothing with λ = 0.5 is applied to the lexicalized re-ordering probabilities. More details about this model can be found in (Galley and Manning, 2008). • Two language models, from Gigaword and Google n-grams. • Word penalty as in (Koehn et al., 2007). • Phrase penalty as in (Koehn et al., 2007). • Linear reordering penalty as defined in (Koehn et al., 2007). The weights of these feature functions were set using an improved version of minimum error rate training (MERT) (Och, 2003). Specifically, we used a stochastic method and two regularization strategies that are described in (Cer et al., 2008), which shows that this approach is superior to both Powell’s method and the variant of coordinate descent found in the Moses MERT utility. Our system was tuned using MT06 (LDC2007E59). We did not tune different systems for different genres. The decoder used a distortion limit of 5. Stack size and n-best list sizes were set to 500 (Moses’s defaults are respectively 200 and 100, which we found less effective). After decoding, hypotheses are selected using the minimum Bayes risk criterion (Kumar and Byrne, 2004).

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Stanford University’s Chinese-to-English Statistical Machine Translation System for the 2008 NIST Evaluation

This document describes Stanford University’s first entry into a NIST MT evaluation. Our entry to the 2008 evaluation mainly focused on establishing a competent baseline with a phrase-based system similar to (Och and Ney, 2004; Koehn et al., 2007). In a three-week effort prior to the evaluation, our attention focused on scaling up our system to exploit nearly all Chinese-English parallel data p...

متن کامل

Machine Translation 2008 Evaluation : Stanford University ’ s System Description

This document describes Stanford University’s first entry into a NIST MT evaluation. Our entry to the 2008 evaluation mainly focused on establishing a competent baseline with a phrase-based system similar to (Och and Ney, 2004; Koehn et al., 2007). In a three-week effort prior to the evaluation, our attention focused on scaling up our system to exploit nearly all Chinese-English parallel data p...

متن کامل

Data selection and smoothing in an open-source system for the 2008 NIST machine translation evaluation

This paper gives a detailed description of a statistical machine translation system developed for the 2008 NIST open MT evaluation. The system is based on the open source toolkit Moses with extensions for language model rescoring in a second pass. Significant improvements were obtained with data selection methods for the language and translation model. An improvement of more than 1 point BLEU o...

متن کامل

The Johns Hopkins University 2003 Chinese-English machine translation system

We describe a Chinese to English Machine Translation system developed at the Johns Hopkins University for the NIST 2003 MT evaluation. The system is based on a Weighted Finite State Transducer implementation of the alignment template translation model for statistical machine translation. The baseline MT system was trained using 100,000 sentence pairs selected from a static bitext training colle...

متن کامل

The Correlation of Machine Translation Evaluation Metrics with Human Judgement on Persian Language

Machine Translation Evaluation Metrics (MTEMs) are the central core of Machine Translation (MT) engines as they are developed based on frequent evaluation. Although MTEMs are widespread today, their validity and quality for many languages is still under question. The aim of this research study was to examine the validity and assess the quality of MTEMs from Lexical Similarity set on machine tra...

متن کامل

The MIT-LL/AFRL IWSLT-2009 MT system

This paper describes the MIT-LL/AFRL statistical MT system and the improvements that were developed during the IWSLT 2009 evaluation campaign. As part of these efforts, we experimented with a number of extensions to the standard phrase-based model that improve performance on the Arabic and Turkish to English translation tasks. We discuss the architecture of the MIT-LL/AFRL MT system, improvemen...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009